Golang Job: Site Reliability Engineer

Job added on

Company

DT One

Location

São Paulo - Brazil

Job type

Full-Time

Golang Job Details

About Us

DT One is the world's leading B2B fintech network powering cross-border VAS remittances to over 5 Billion users across 160 countries. Whether remittance senders choose banks, telcos, digital apps or retail, DT One is there to delivers smarter data-driven mobile solutions, empower financial inclusion and much needed connectivity to ensure that no one is left unconnected.

Founded in 2005, the DT One team is headquartered in Singapore with regional offices in Dubai, Miami, Nairobi and London.

For more information visit www.DTOne.com

Site Reliability Engineer

At DT One, we count on our Site Reliability Engineers (SREs) to empower our users with a rich feature set, high availability, and extreme performance level. As we expand our platform infrastructure and applications, we are currently seeking a talented Site Reliability Engineer to maintain, improve and flawlessly operate our environments. We are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a globally distributed team to develop real-world solutions and positive user experiences at every interaction.

Objectives of this Role

  • Run the production environment by monitoring availability and taking a holistic view of system health.
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
  • Establish and guarantee platform infrastructure, and applications service level objectives.
  • Provide primary operational support and engineering for multiple large distributed software applications including on-call shifts.
  • Build software and systems to manage network infrastructure, platform infrastructure, and applications.
  • Improve reliability, quality, security, and time-to-market of our suite of software solutions.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, platform management, and capacity planning.
  • Document every action turning findings into repeatable actions–and then into future automation.

Required Skills and Qualifications

  • Bachelor's degree in computer science or other highly technical, scientific discipline.
  • Ability to program (structured and OO) with one or more high-level languages, such as Golang, Python, Ruby, and JavaScript.
  • Experience with AWS cloud infrastructure management and related services.
  • Experience with Infrastructure as Code and Configuration Management concepts and related tools and technologies, such as Terraform and Ansible.
  • Hands-on experience with Linux administration, command-line interface, and shell scripting.
  • Experience with dynamic resource management frameworks, and technologies, such as Kubernetes and Nomad.
  • Experience with source code management tools, and related workflows.
  • Experience with continuous integration and continuous deployment concepts and related tools and technologies, such as Jenkins, GitlabCI.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
  • Good communication skills in English.

Preferred Qualifications

  • Previous success in technical engineering.
  • Previous experience with multiple large distributed software applications operations.
  • Previous experience defining and implementing deployment and release standards.
  • Experience with database administration and performance tunings, such as PostgreSQL, MySQL, ElasticSearch, and Redis.
  • Experience with monitoring tools, such as Prometheus, DataDog, and NewRelic.
  • Experience with VPN configuration and administration.
  • Coding experience beyond simple scripts.
  • Strong Site Reliability principles oriented mindset.
  • Sharing and mentoring mindset.

Sound like you? Apply now!